{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Experimenting with OpenAI GPT\n",
    "\n",
    "This notebook is part of [AI for Beginners Curriculum](http://aka.ms/ai-beginners).\n",
    "\n",
    "In this notebook, we will explore how we can play with OpenAI-GPT model using Hugging Face `transformers` library.\n",
    "\n",
    "Without further ado, let's instantiate text generating pipeline and start generating! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\bethanycheum\\Desktop\\AI-For-Beginners\\.venv\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "Downloading model.safetensors: 100%|██████████| 479M/479M [04:28<00:00, 1.78MB/s] \n",
      "c:\\Users\\bethanycheum\\Desktop\\AI-For-Beginners\\.venv\\lib\\site-packages\\huggingface_hub\\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\\Users\\bethanycheum\\.cache\\huggingface\\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.\n",
      "To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development\n",
      "  warnings.warn(message)\n",
      "Some weights of OpenAIGPTLMHeadModel were not initialized from the model checkpoint at openai-gpt and are newly initialized: ['position_ids']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading (…)neration_config.json: 100%|██████████| 74.0/74.0 [00:00<00:00, 48.8kB/s]\n",
      "Downloading (…)olve/main/vocab.json: 100%|██████████| 816k/816k [00:00<00:00, 1.76MB/s]\n",
      "Downloading (…)olve/main/merges.txt: 100%|██████████| 458k/458k [00:00<00:00, 1.11MB/s]\n",
      "Downloading (…)/main/tokenizer.json: 100%|██████████| 1.27M/1.27M [00:00<00:00, 2.12MB/s]\n",
      "Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers\n",
      "pip install xformers.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[{'generated_text': \"Hello! I am a neural network, and I want to say that i apologize for not coming to you yourself, for not helping you, and that i was too busy getting dressed and studying for a midterm. you know, the kind where the teachers are like that and they come in pairs with their boyfriends, but not with theirs. it's true, that i have had a girlfriend, and i'm only going on wednesdays and thursdays because i was too busy with college, but maybe\"},\n",
       " {'generated_text': 'Hello! I am a neural network, and I want to say that we have been blessed with a wonderful gift ; no one of us has died at all. and our spirits are strong, very strong. in one very lucky moment of luck for you, all has been given direction and destiny, and for us there are no more mysteries. the earth has been chosen for you, and that earth is now ours, and you must be forever in our hearts. \" \\n the words, as one,'},\n",
       " {'generated_text': 'Hello! I am a neural network, and I want to say that if you would just turn and face the general, you would have a nice day. \" \\n \" sure thing, \" said one of the soldiers, and started to run. the rest of the soldiers followed, shouting. the general turned to general zulu, raising his arm. the general said something in his native language, and the general immediately started to run. zulu started to move toward the wall, with the'},\n",
       " {'generated_text': 'Hello! I am a neural network, and I want to say that i am not a doctor but an anthropologist to you, a specialist, a specialist in the field of astrobiological biology, and that i am very much involved in this investigation. i am not sure, i am not certain, but i can confirm your conclusions and therefore i will go to the top. i have a colleague who has just returned from this expedition and his findings confirm that you are a specialist. that is, he'},\n",
       " {'generated_text': \"Hello! I am a neural network, and I want to say that everyone here is in agreement that no matter how many times i say to myself,'he was never a man of action on the battlefield,'or'he 'll never take a chance at killing any civilians,'or'he 'll never let his men go undefended against enemy forces of this caliber,'or'that's just what i need in a day like today. \\n you see, there are only three groups that\"}]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "model_name = 'openai-gpt' \n",
    "\n",
    "generator = pipeline('text-generation', model=model_name)\n",
    "\n",
    "generator(\"Hello! I am a neural network, and I want to say that\", max_length=100, num_return_sequences=5)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prompt Engineering\n",
    "\n",
    "In some of the problems, you can use openai-gpt generation right away by designing correct prompts. Have a look at the examples below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'Synonyms of a word cat: the same cat i used to stare at, and you in'},\n",
       " {'generated_text': 'Synonyms of a word cat: cat of the woods, cat of the hills, cat of'},\n",
       " {'generated_text': 'Synonyms of a word cat: you! \\n \" it\\'s a girl. \" i said'},\n",
       " {'generated_text': \"Synonyms of a word cat: big cat. but how come, we didn't hear it\"},\n",
       " {'generated_text': 'Synonyms of a word cat: \" mea - o - c \" which makes them sound'}]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generator(\"Synonyms of a word cat:\", max_length=20, num_return_sequences=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'I love when you say this -> Positive\\nI have myself -> Negative\\nThis is awful for you to say this -> positive this is so horrible - > positive that your brother is gay - >'},\n",
       " {'generated_text': 'I love when you say this -> Positive\\nI have myself -> Negative\\nThis is awful for you to say this -> negative i will bring this on you -, < positive am i, i'},\n",
       " {'generated_text': 'I love when you say this -> Positive\\nI have myself -> Negative\\nThis is awful for you to say this -> negative i have self - esteem i must take it - : \\n - -'},\n",
       " {'generated_text': 'I love when you say this -> Positive\\nI have myself -> Negative\\nThis is awful for you to say this -> negative this is - : \\n if it were true that the devil would have'},\n",
       " {'generated_text': \"I love when you say this -> Positive\\nI have myself -> Negative\\nThis is awful for you to say this -> positive i have you - > positive it's a bad thing, > positive\"}]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generator(\"I love when you say this -> Positive\\nI have myself -> Negative\\nThis is awful for you to say this ->\", max_length=40, num_return_sequences=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'Translate English to French: cat => chat, dog => chien, student =>  new and unusual. there were no more words to be'},\n",
       " {'generated_text': 'Translate English to French: cat => chat, dog => chien, student =>  student \\n his eyes were huge in his lean face as'},\n",
       " {'generated_text': \"Translate English to French: cat => chat, dog => chien, student =>  the teacher's words, their words, their words.\"}]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generator(\"Translate English to French: cat => chat, dog => chien, student => \", top_k=50, max_length=30, num_return_sequences=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'People who liked the movie The Matrix also liked  it, and there was the movie of the first man after us. \\n i wanted to laugh at how stupid these stupid actors were. no, they were'},\n",
       " {'generated_text': \"People who liked the movie The Matrix also liked  the movie, and the film was the result. and that's when the man in the story was brought into reality, after a few decades. \\n a\"},\n",
       " {'generated_text': 'People who liked the movie The Matrix also liked  the movie the matrix, because there was a very old movie movie called the matrix, where there was a great super hero, and the super hero came out'},\n",
       " {'generated_text': \"People who liked the movie The Matrix also liked  the movie that didn't have a chance to pay cash, if they could afford it. most often they got a good deal and a lot of money,\"},\n",
       " {'generated_text': \"People who liked the movie The Matrix also liked  the movie, and i didn't seem to have the same problem. \\n i 'd met the other half of my family. i spent most of my time\"}]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generator(\"People who liked the movie The Matrix also liked \", max_length=40, num_return_sequences=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text Sampling Strategies\n",
    "\n",
    "So far we have been using simple **greedy** sampling strategy, when we selected next word based on the highest probability. Here is how it works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw my friend, a young man, sprawled across the bed in his bed. \\n \" hi, i\\'m mike eptirard. \" \\n there was silence on the other side of the door. i listened for any trace of life but there was nothing. my heart began to pound, i was starting to sweat, i took out my wallet'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw my mother on the bed, hugging her legs to her chest and sobbing. i saw my dad and mother from the corner of my eye. \\n elfin face was covered in tears as i entered the room. my dad and mother also wept ; just as they did every other time i came to work. but this time, they had different faces'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw the room had changed because it was dark. it still smelled like a hospital. a new light shined through from a vent in the ceiling. i found myself in a bathroom and a small room with a sink and a wall of glass. the bathroom billion years ago. not so different from all of the rest of the apartment. \\n now...'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a large woman with dark hair and pale skin. she was asleep, but i noticed a faint movement of her face. i could sense she was awake. i got up and walked over to her. \\n \" hello miss. i am inspector michael o\\'dell ; we are investigating the case against you. i wanted to ask if you were the'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw i had an empty table and three empty chairs. that was all i needed. i had left a note on a table in the center of the room and had a pen in hand. \" \\n \" i think what you were doing was something he was doing to her. \" \\n \" yeah, \" i nodded with a grin. \" i'}]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw\"\n",
    "generator(prompt,max_length=100,num_return_sequences=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Beam Search** allows the generator to explore several directions (*beams*) of text generation, and select the ones with highers overall score. You can do beam search by providing `num_beams` parameter. You can also specify `no_repeat_ngram_size` to penalize the model for repeating n-grams of a given size: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a man sitting in a chair with his head in his hands. he didn\\'t look up as i approached. \\n \" excuse me, sir, \" i said. \" can i help you? \" \\n the man looked up at me. his eyes were red - rimmed and his face was pale, as if he hadn\\'t slept in days'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a man sitting at a desk in the middle of the room. he had his back to me, so i couldn\\'t see what he was doing. \" \\n \" what did he look like? \" i asked as i sat down on the bed next to her. \\n she took a deep breath and looked at me with tears in her eyes'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a woman sitting on the bed, reading a book. she looked up at me and smiled. \\n \" hi, \" she said. \" can i help you? \" \\n i sat down next to her and looked around the room. the walls were white, and there was a large window in the middle of the wall that looked out on'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a man sitting at a table in the middle of the room. he looked up as i walked in, and when he saw me, he got up and walked over to me. \\n \" can i help you? \" he asked as he put his hand on the small of my back and led me to a chair at the other end of'},\n",
       " {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a woman sitting on the edge of her bed, reading a book. she looked up at me and smiled. \\n \" hello, \" she said. \" can i help you? \" \\n i didn\\'t know what to say, so i just sat down in the chair next to the bed and looked at her. her hair was dark brown'}]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw\"\n",
    "generator(prompt,max_length=100,num_return_sequences=5,num_beams=10,no_repeat_ngram_size=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sampling** selects the next word non-deterministically, using the probability distribution returned by the model. You turn on sampling using `do_sample=True` parameter. You can also specify `temperature`, to make the model more or less deterministic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw her. she was on the bed, but she looked very different. \\n \" honey, what\\'s the matter? \" i asked. \\n she sat up. \" i can\\'t believe it\\'s real. i\\'ve been dreaming about you for the last two days. \" \\n \" i can\\'t believe it either. i guess that\\'s how'}]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw\"\n",
    "generator(prompt,max_length=100,do_sample=True,temperature=0.8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also provide to additional parameters to sampling:\n",
    "* `top_k` specifies the number of word options to consider when using sampling. This minimizes the chance of getting weird (low-probability) words in our text.\n",
    "* `top_p` is similar, but we chose the smallest subset of most probable words, whose total probability is larger than p.\n",
    "\n",
    "Feel free to experiment with adding those parameters in."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fine-Tuning your models\n",
    "\n",
    "You can also [fine-tune your model](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/fine-tuning?pivots=programming-language-studio?WT.mc_id=academic-77998-bethanycheum) on your own dataset. This will allow you to adjust the style of text, while keeping the major part of language model. "
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "16af2a8bbb083ea23e5e41c7f5787656b2ce26968575d8763f2c4b17f9cd711f"
  },
  "kernelspec": {
   "display_name": "Python 3.8.12 ('py38')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
